Search CORE

13 research outputs found

Multilingual Neural Machine Translation System for Indic to Indic Languages

Author: Das Sudhansu Bala
Ekbal Asif
Mishra Tapas Kumar
Panda Divyajyoti
Patra Bidyut Kr.
Publication venue
Publication date: 22/06/2023
Field of study

This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using the BLEU score. In addition, the languages are classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI). The effect of language relatedness on MNMT model efficiency is studied. Owing to the presence of large corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are also built and examined. To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages. Results reveal that using related languages is beneficial for the WI group only, while it is detrimental for the EI group and shows an inconclusive effect on the DR group, but it is useful for EN-IL models. Thus, related language groups are used to develop pivot MNMT models. Furthermore, the IL corpora are transliterated from the corresponding scripts to a modified ITRANS script, and the best MNMT models from the previous approaches are built on the transliterated corpus. It is observed that the usage of pivot models greatly improves MNMT baselines with AS-TA achieving the minimum BLEU score and PA-HI achieving the maximum score. Among languages, AS, ML, and TA achieve the lowest BLEU score, whereas HI, PA, and GU perform the best. Transliteration also helps the models with few exceptions. The best increment of scores is observed in ML, TA, and BN and the worst average increment is observed in KN, HI, and PA, across all languages. The best model obtained is the PA-HI language pair trained on PAWI transliterated corpus which gives 24.29 BLEU.Comment: 38 pages, 2 figure

arXiv.org e-Print Archive

Optimization Matrix Factorization Recommendation Algorithm Based on Rating Centrality

Author: Bidyut Kr. Patra
F Cacheda
F Li
F Ricci
G Adomavicius
JL Herlocker
L Chen
M Grčar
Y Cai
Y Koren
Y Zhou
Zan Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/06/2018
Field of study

Matrix factorization (MF) is extensively used to mine the user preference from explicit ratings in recommender systems. However, the reliability of explicit ratings is not always consistent, because many factors may affect the user's final evaluation on an item, including commercial advertising and a friend's recommendation. Therefore, mining the reliable ratings of user is critical to further improve the performance of the recommender system. In this work, we analyze the deviation degree of each rating in overall rating distribution of user and item, and propose the notion of user-based rating centrality and item-based rating centrality, respectively. Moreover, based on the rating centrality, we measure the reliability of each user rating and provide an optimized matrix factorization recommendation algorithm. Experimental results on two popular recommendation datasets reveal that our method gets better performance compared with other matrix factorization recommendation algorithms, especially on sparse datasets

arXiv.org e-Print Archive

Crossref

An approach to compute user similarity for GPS applications

Author: Agrawal
Backstrom
Bao
Bidyut Kr. Patra
Bray
Cao
Chen
Chen
Chen
Chen
Fire
Gao
Giannotti
Giannotti
Han
Hopfgartner
Huo
Jeung
Jin
Li
Li
Lu
Lv
Ma
Pei
Pramit Mazumdar
Russell Lock
Sarwar
Sathya Babu Korra
Wang
Wang
Xiao
Xiao
Xue
Ying
Ying
Zaki
Zheng
Zheng
Zheng
Zheng
Zheng
Zheng
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A knowledge reuse framework for improving novelty and diversity in recommendations

Author: Pathak Apurva
Patra Bidyut Kr.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Crossref

VTT Research System

Effective data summarization for hierarchical clustering in large datasets

Author: Nandi Sukumar
Patra Bidyut Kr.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

VTT Research System

P.: A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognition 44(12

Author: Bidyut Kr Patra
P Viswanath
Sukumar Nandi
Publication venue
Publication date: 01/01/2011
Field of study

a b s t r a c t Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of Oðn 2 Þ, where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets

CiteSeerX

A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data

Author: Launonen Raimo
Nandi Sukumar
Ollikainen Ville
Patra Bidyut Kr.
Publication venue
Publication date: 01/01/2015
Field of study

Collaborative filtering (CF) is the most successful approach for personalized product or service recommendations. Neighborhood based collaborative filtering is an important class of CF, which is simple, intuitive and efficient product recommender system widely used in commercial domain. Typically, neighborhood-based CF uses a similarity measure for finding similar users to an active user or similar products on which she rated. Traditional similarity measures utilize ratings of only co-rated items while computing similarity between a pair of users. Therefore, these measures are not suitable in a sparse data. In this paper, we propose a similarity measure for neighborhood based CF, which uses all ratings made by a pair of users. Proposed measure finds importance of each pair of rated items by exploiting Bhattacharyya similarity. To show effectiveness of the measure, we compared performances of neighborhood based CFs using state-of-the-art similarity measures with the proposed measured based CF. Recommendation results on a set of real data show that proposed measure based CF outperforms existing measures based CFs in various evaluation metrics

VTT Research System

Critique on Natural Noise in Recommender Systems

Author: Amatriain Xavier
Beel Joeran
Bellogín Alejandro
Bobadilla J.
Choudhary Priyankar
Gallardo Jorge Castro
Jurdi Wissam Al
Kishan Kalitkar S.K.
Latha R.
Patra Bidyut Kr.
Pham Xuan Hau
Ricci Francesco
Ricci Francesco
Said Alan
Toledo Raciel Yera
Yera Raciel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Hidden location prediction using check-in patterns in location-based social networks

Author: Bidyut Kr. Patra
C Song
CC Robusto
GD Forney Jr
HC Sung
J Han
JJC Ying
Korra Sathya Babu
L Xin
L Xin
LR Rabiner
M Erwig
M Fire
MJ Shaw
P Mazumdar
PA Devijver
Pramit Mazumdar
R Agrawal
Russell Lock
Z Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This is a post-peer-review, pre-copyedit version of an article published in Knowledge and Information Systems. The final authenticated version is available online at: https://doi.org/10.1007/s10115-018-1170-5Check-in facility in a Location Based Social Network (LBSN) enables people to share location information as well as real life activities. Analysing these historical series of check-ins to predict the future locations to be visited has been very popular in the research community. However, it has been found that people do not intend to share the privately visited locations and activities in a LBSN. Research into extrapolating unchecked locations from historical data is limited. Knowledge of hidden locations can have a wide range of benefits to society. It may help the investigating agencies in identifying possible places visited by a suspect, a marketing company in selecting potential customers for targeted marketing, for medical representatives in identifying areas for disease prevention and containment, etc. In this paper, we propose an Associative Location Prediction Model (ALPM), which infers privately visited unchecked locations from a published user trajectory. The proposed ALPM explores the association between a user's checked-in data, the Hidden Markov Model and proximal locations around a published check-in for predicting the unchecked or hidden locations. We evaluate ALPM on real-world Gowalla LBSN dataset for the users residing in Beijing, China. Experimental results show that the proposed model outperforms the existing state of the art work in literature

Crossref

Loughborough University Institutional Repository